binary: fix string parsing on big-endian hosts by amaanq · Pull Request #6605 · KhronosGroup/SPIRV-Tools

amaanq · 2026-03-17T03:12:03Z

Note

The fixes are split into individual commits to make reviewing easier

Problem

Two places in the codebase read SPIR-V string data incorrectly on big-endian hosts:

binary.cpp: When parsing a SPIR-V binary with a different endianness than the host (e.g. a spec-conformant little-endian binary on ppc64/s390x), the parser reads string operands from raw _.words
without byte-swapping first. MakeString then extracts bytes assuming native word layout, producing garbled strings — e.g. "OpenCL.std" reads as "nepOs.LC".
extract_source.cpp: The objdump source extraction used reinterpret_cast<const char*> on parsed instruction words, which gives wrong byte order on big-endian hosts since SPIR-V strings pack
characters starting from the lowest byte of each word.

Solution

Byte-swap the words before passing them to MakeString when requires_endian_conversion is true, matching how other operand types are already handled via spvFixWord.
Use MakeString instead of raw casts, which correctly extracts characters from the low bits of each word regardless of host endianness.

CLAassistant · 2026-03-17T03:12:11Z

All committers have signed the CLA.

When parsing a SPIR-V binary with a different endianness than the host (e.g. a spec-conformant little-endian binary on ppc64/s390x), the parser reads string operands from raw `_.words` without byte-swapping first. `MakeString` then extracts bytes assuming native word layout, producing garbled strings, for example, "OpenCL.std" reads as "nepOs.LC". Byte-swap the words before passing them to `MakeString` when `requires_endian_conversion` is true, matching how other operand types are already handled via `spvFixWord`.

`extract_source.cpp` used `reinterpret_cast<const char*>` on parsed instruction words to read string data. On big-endian hosts, bytes within each native-endian word are in high-to-low memory order, but SPIR-V strings pack characters starting from the lowest byte of each word. Use `MakeString` instead of raw casts, which correctly extracts characters from the low bits of each word regardless of host endianness.

dneto0 · 2026-04-14T21:34:39Z

Thanks for your patience awaiting my review.

Hi, please take a look at the analysis and experiments at #5302 (comment)

There I show that the binary parser handles both big-endian and little-endian binaries.

I haven't looked at source extraction. Please provide examples showing there is a problem, and also provide tests with any suggested fixes.

amaanq force-pushed the fix-big-endian-strings branch from 37f786b to 9103f43 Compare March 17, 2026 22:56

amaanq force-pushed the fix-big-endian-strings branch from 9103f43 to cafde04 Compare March 18, 2026 00:19

amaanq force-pushed the fix-big-endian-strings branch from cafde04 to 133e93b Compare March 18, 2026 00:26

s-perron requested a review from dneto0 March 26, 2026 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binary: fix string parsing on big-endian hosts#6605

binary: fix string parsing on big-endian hosts#6605
amaanq wants to merge 2 commits intoKhronosGroup:mainfrom
amaanq:fix-big-endian-strings

amaanq commented Mar 17, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 17, 2026 •

edited

Loading

Uh oh!

dneto0 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amaanq commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Uh oh!

CLAassistant commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dneto0 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amaanq commented Mar 17, 2026 •

edited

Loading

CLAassistant commented Mar 17, 2026 •

edited

Loading